bag of words

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

A document consists of structured text: sentances, paragaphs, and meaningful phrases. Bag of words techniques ignore this structure reducing the document to the set of words with a frequency count for each. This is used for similarity metrics such as the Jaccard similarity and cosine similarity.

Used on Chap. 10: page 213